Hive, Pig & Hbase Performance Evaluation for Data Processing Applications

نویسندگان

  • Vaishali Chauhan
  • Meenakshi Sharma
چکیده

Information extraction has received significant attention due to the rapid growth of unstructured data. Researcher needs a low-cost, scalable, easy-to-use and fault tolerance platform for large volume data processing eagerly. It is very important to evaluate the MapReduce based frameworks for data processing applications. This paper leverages the comparative study of HBase, Hive and Pig.The processing time of HBase, Hive and Pig is implemented on a data set with simple queries and we will observed the performance of the HBase, Hive, Pig and evaluate the result according to it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed RDF Triple Store Using HBase and Hive

The growth of web data has presented new challenges regarding the ability to effectively query RDF data. Traditional relational database systems efficiently scale and query distributed data. With the development of Hadoop its implementation of the MapReduce Framework along with HBase, a NoSQL data store, the semantics of processing and querying data has changed. Given the existing structure of ...

متن کامل

Data Mining over Large Datasets Using Hadoop in Cloud Environment

There is a drastic growth of data’s in the web applications and social networking and such data’s are said be as Big Data. The Hive queries with the integration of Hadoop are used to generate the report analysis for thousands of datasets. It requires huge amount of time consumption to retrieve those datasets. It lacks in performance analysis. To overcome this problem the Market Basket Analysis ...

متن کامل

Data Quality for Web Log Data Using a Hadoop Environment

Solving data quality problems is important for data warehouse construction and operation. This paper is based on developing a web log warehouse. It proposes a data quality problem methodology for data preprocessing within the log warehouse. It provides a hierarchical data warehouse architecture that is suitable for resource saving and ad hoc requirements. The data preprocessing is completed usi...

متن کامل

Rank Join Queries in NoSQL Databases

Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are offered using SQLlike languages (like Hive and Pig), based on...

متن کامل

Optimizing data management for MapReduce applications on large-scale distributed infrastructures

ions were developed based on MapReduce, with the goal of providing a simple-touse interface for expressing database-like queries [64, 6]. Bioinformatics is one of the numerous research domains that employ MapReduce to model their algorithms [69, 58, 56]. As an example, CloudBurst [69] is a MapReduce-based algorithm for mapping next-generation sequence data to the human genome and other referenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016